Goto

Collaborating Authors

 inverse stability





How degenerate is the parametrization of neural networks with the ReLU activation function?

Neural Information Processing Systems

Neural network training is usually accomplished by solving a non-convex optimization problem using stochastic gradient descent. Although one optimizes over the networks parameters, the main loss function generally only depends on the realization of the neural network, i.e. the function it computes. Studying the optimization problem over the space of realizations opens up new ways to understand neural network training. In particular, usual loss functions like mean squared error and categorical cross entropy are convex on spaces of neural network realizations, which themselves are non-convex. Approximation capabilities of neural networks can be used to deal with the latter non-convexity, which allows us to establish that for sufficiently large networks local minima of a regularized optimization problem on the realization space are almost optimal.





Reviews: How degenerate is the parametrization of neural networks with the ReLU activation function?

Neural Information Processing Systems

I read the author response and other reviews. The author response provides nice additional demonstration about the implication of connecting the two problems via inverse stability. This is an interesting and potentially important paper for a future research on this topic. This paper explains the definition of the inverse stability, proves its implication for neural network optimization, provides failure modes of having the inverse stability, and proves the inverse stability for a simple one-hidden layer network with a single output. Originality: The paper definitely provides a very interesting and unique research direction.


How degenerate is the parametrization of neural networks with the ReLU activation function?

Neural Information Processing Systems

Neural network training is usually accomplished by solving a non-convex optimization problem using stochastic gradient descent. Although one optimizes over the networks parameters, the main loss function generally only depends on the realization of the neural network, i.e. the function it computes. Studying the optimization problem over the space of realizations opens up new ways to understand neural network training. In particular, usual loss functions like mean squared error and categorical cross entropy are convex on spaces of neural network realizations, which themselves are non-convex. Approximation capabilities of neural networks can be used to deal with the latter non-convexity, which allows us to establish that for sufficiently large networks local minima of a regularized optimization problem on the realization space are almost optimal.


How degenerate is the parametrization of neural networks with the ReLU activation function?

Elbrächter, Dennis Maximilian, Berner, Julius, Grohs, Philipp

Neural Information Processing Systems

Neural network training is usually accomplished by solving a non-convex optimization problem using stochastic gradient descent. Although one optimizes over the networks parameters, the main loss function generally only depends on the realization of the neural network, i.e. the function it computes. Studying the optimization problem over the space of realizations opens up new ways to understand neural network training. In particular, usual loss functions like mean squared error and categorical cross entropy are convex on spaces of neural network realizations, which themselves are non-convex. Approximation capabilities of neural networks can be used to deal with the latter non-convexity, which allows us to establish that for sufficiently large networks local minima of a regularized optimization problem on the realization space are almost optimal.